Fingerprint analysis reveals sources of petroleum hydrocarbons in soils of different geographical oilfields of China and its ecological assessment

The distribution and characteristics of petroleum in three different geographic oilfields in China: Shengli Oilfield (SL), Nanyang Oilfield (NY), and Yanchang Oilfield (YC) were investigated. The average concentration of the total petroleum hydrocarbons (TPHs) conformed to be in the following law: SL Oilfield > NY Oilfield > YC Oilfield. Fingerprint analysis on the petroleum contamination level and source was conducted by the geochemical indices of n-alkanes and PAHs, such as low to high molecular weight (LMW/HMW) hydrocarbons, n-alkanes/pristine or phytane (C17/ Pr, C18/Ph), and ratio of anthracene/ (anthracene + phenanthrene) [Ant/(Ant + Phe)]. Soils adjacent to working well oils indicated new petroleum input with higher ratio of low to high molecular weight (LMW/HMW) hydrocarbons. The oil contamination occurred in the grassland soils might result of rainfall runoff. Petroleum source, petroleum combustion source, and biomass combustion were dominant PAHs origination of soils collected from oil exploitation area, petrochemical-related sites, farmland and grassland, respectively. The suggestive petroleum control strategies were proposed in each oilfield soils. Ecological potential risk of PAHs was assessed according to the toxic equivalent quantity (TEQ) of seven carcinogenic PAHs. The results showed that high, medium, and low ecological risk presented in petro-related area, grassland soils, and farmland soils, respectively. High ecological risk was persistent in abandoned oil well areas over abandoned time of 15 years, and basically stable after 5 years. This study can provide a critical insight to ecological risk management and source control of the petroleum contamination.


Materials and methods
Research sites. The Shengli Oilfield (SL), Nanyang Oilfield (NY), and Yanchang Oilfield (YC) are three typical large oilfields located in east, central and west of China, respectively. SL Oilfield is located in the Yellow River delta of Shandong province and it is the second largest oil production base in China. The amount of oil resources is estimated at 1.45 × 10 6 million tons and the exploration area is 1.94 × 10 5 km 2 . The oil exploitation area Gudao is an efficient ecological economic zone of Dongying city, in Shandong Province. Due to its unique geographical location and natural environment, this area is prevailingly contaminated by PHs 16 . The NY is located in Nanyang Basin of Henan province and its amount of oil resources is estimated at 3.40 × 10 2 million tons and the exploration area is 9.58 × 10 4 km 217 . It has been explored for nearly 50 years from 1972. The YC Oilfield is located in the loess plateau region in the northern part of Shanxi province, and has a long history and is the first oil well explored in 1907 in China 18 . The crude oil of YC Oilfield, NY Oilfield and SL Oilfield belong to typical light weight oil, medium weight oil and heavy weight oil, respectively, with different density, API, acidity, and sulfur content (shown in Supporting Information, Table S1). The oil exploitation and the development of petroleum industry exacerbated the petroleum contamination on these oilfield soils to varying degrees. Sample collection. Surface soils (0-20 cm) were collected from the oilfield and its surrounding areas during September to December of 2019. The study areas were divided into five characteristic zones, viz. SL Oilfield (zone 1), NY Oilfield (zone 2 and 3) and YC Oilfield (zone 4 and 5) (Fig. 1). Total of 47 surface soils were collected from SL Oilfield (S1-S12), NY Oilfield (N1-N10), and YC Oilfield (Y1-Y25). To survey the regional pollution in oilfield, grid sampling method was applied in the collection of YC Oilfield (Y6-Y25), and the grid area was 800 m × 2000 m. Five subsamples were thoroughly mixed by coning and quartering to make a composite sample. Sampling soils were taken into brown bottles and transported in an icebox to the laboratory, and stored at 4 °C in laboratory till analysis. Description of sampling locations were recorded at the time of sampling (Supporting Information, Table S2). The geographic co-ordinates of sampling locations were recorded by GPS set (Garmin GPS Map 76CSX).
Determination of oil components. Extraction and purification. Oil components were Soxhelt extracted, and then purified by using a silica gel-alumina column as described previously 19 . In brief, Five-gram aliquant of moist soil (after deduction of moisture content) was Soxhlet-extracted for 12 h with 120 mL dichloromethane at 54 °C 20 . Extracts were eluted from a glass column (dimensions: 20 mm × 400 mm) containing pre-rinsed activated silica gel and neutral aluminum (12 g:6 g, soaked with hexane) to purify the TPHs with addition of 100 mL dichloromethane, and using the following sequence of solvents: 20 mL of hexane, 70 mL of hexane and dichloromethane (1:1) to separate saturated hydrocarbons (SHs) and aromatic hydrocarbons (AHs). The separate components were concentrated to dryness by use of a rotary evaporator (Changcheng, Beijing, China), and then www.nature.com/scientificreports/ redissolve to 1 mL with hexane. The polar components were eluted by 100 mL dichloromethane and determined gravimetrically 21 . 100 µL aliquot of four surrogate standards (n-hexane-d14, n-undecane-d24, phenanthrene-d10, and benz[a]anthracene-d12) (each at a concentration of 2 µg/mL) were added to soil samples before extraction to estimate the extraction efficiency of n-alkanes and 16 PAHs, respectively. Quality assurance and control. To ensure the accuracy and reliability of the experimental analysis data, blank, substrates, and the parallel samples were analyzed. Each batch of samples extracted contained a method blank,  www.nature.com/scientificreports/ which identified the external contamination occurred during the sample extraction and cleanup processes. Solvent blanks were added in each set of samples to test for carryover and background contamination. Standard samples with a specific concentration were added to each of the 10 samples used for analysis to recalibrate the retention time and peak area of the compound to ensure qualitative and quantitative accuracy. The PHs concentrations in the solvent and method blanks were all lower than the limit of quantification (LOQ), which was defined as the concentration of target peak five times the signal of solvent blank chromatograms. The method detection limit (MDL) was defined as the concentration giving a signal-to-noise ratio of 3 in the chromatograms of blank sample. The MDL of TPHs (C10-C40), SHs, AHs were both 1.0 mg/kg, the MDL of n-alkanes and 16 PAHs were 0.50-1.21 μg/kg and 0.67-1.68 μg/kg. The mean recoveries of the surrogate standards of n-alkanes and 16 PAHs were 87.1-112.4%, and 82.3-114.2%, respectively (Supporting Information, Table S3).
Ecological risk assessment of PAHs. The toxic equivalent quantity (TEQ) was used to assess the ecological risk derived from PAHs, and toxic equivalent factors (TEFs) were introduced to assess the risk of PAH contribution 24 . In this study, due to great divergence of the sampling environment, we classified the soil site into three categories: petro-related area soils, farmland soils, and grassland soils. BaP was conducted to the standard reference and the TEQBaP values of the PAHs were calculated by the equation: TEQBaP = n i C i · TEF i . Where i is the i-th PAH, n is the PAHs participating in the cumulative calculation, Ci is the concentration of the i-th PAHs (μg/kg), TEFi is the toxic equivalent factor of the i-th PAH, TEQBaP is the toxic equivalent quantity based on BAP (μg/kg). Comparison was made between TEQBaP7 which were calculated based on seven kinds of carcinogenic PAHs (namely BaA, Chr, BbF, BkF, BaP, InP, DBA), and TEQBaP16 which were calculated based on the 16 PAHs.

Result and discussion
Concentration of TPHs in surface soils. Statistical results of TPHs concentrations at different geographic oilfields were showed in Fig. 2, and grid regional distribution of TPHs in YC Oilfield surface soils (Y6-Y25) were shown in Fig. 3. Results are given as mean value of triplicate analysis of each sample. The results of TPHs concentration in soil samples showed that the three oilfields all suffered from varying degrees of petroleum pollution, and 60.92% of the 47 sampling points was significantly higher than the soil critical value (500 mg/kg). The average concentration of the TPHs in each study areas conformed to be in the following law: SL Oilfield (average: 5.36 × 10 3 mg/kg) > NY Oilfield (average: 1.73 × 10 3 mg/kg) > YC Oilfield (average: 1.37 × 10 3 mg/kg). The highest concentration of the TPHs were found in SL Oilfield surface soils, ranging from 1.21 × 10 2 to 6.66 × 10 4 mg/ kg, and NY Oilfield had the second highest TPHs concentrations in the range from 15.82 to 7.42 × 10 3 mg/kg. The concentrations of TPHs in YC Oilfield ranged from 12.34 to 5.38 × 10 3 mg/kg. The petroleum contamination mainly derived from abandoned and working oil wells. S4 and S8 soils were collected near the abandoned oil well and working oil well, respectively, and had the highest concentration of TPHs up to 5.28 × 10 4 and 6.66 × 10 4 mg/ kg. Y1, N8 near the abandoned oil well also had high concentration of TPHs with 5.39 × 10 3 and 7.42 × 10 3 mg/ kg, respectively. Pollution caused by grounded crude oil in exploitation process has been a serious problem in oilfield area. Our previous research reported that the TPHs content in Dagang Oilfield soils collected adjacent to working oil wells were about 20-folds higher than that in corn soils and living area soils 25 . Concentration contour map of TPHs in YC Oilfield by grid sampling method showed that regional pollution in the northwest and southeast area are more serious than other sites. Y6 near the gas station and Y15, Y21, Y23 adjacent to the working oil wells have higher concentration (2.12 × 10 3 -5.34 × 10 3 mg/kg) of TPHs than other farmland and grass www.nature.com/scientificreports/ soils. Previous study reported that the concentrations of TPHs ranged 7.0 × 10 2 -4.0 × 10 3 mg/kg in oil exploitation areas of the loess plateau region (34°20′N,107°10′E), showing a similar pollution level with this study 26 . The percentage composition of total PAHs, SHs and polar components of petroleum hydrocarbons were shown in Table 1. In general, the dominant petroleum component was saturated hydrocarbons in all soils, accounting more than 50%. Yet, the percentage proportion of PAHs and SHs in contamination soils adjacent to working and abandon oil wells were significantly different (p < 0.05). In detail, the proportion level of SHs in soils adjacent to working oil wells were the highest, accounting for 70.2-75.9%. The percentage proportion of SHs in soils near the abandoned oil wells accounted for 52.2-61.1%, decreasing with abandoned time of oil well lengthened. In contrast, the proportions level of PAHs in soils near the abandoned oil wells accounted the highest proportion for 30.4-34.4%. The proportions of petroleum hydrocarbons can indicate the biodegradation and natural aging process. Many researchers identified that SHs were more easily degraded by the indigenous microbes, and followed by PAHs, especially with more numbers of benzene rings. Polar components (resins and asphaltenes) are more intractable to metabolize compared with saturate and aromatic hydrocarbons 27 . It is confirmed by this study that the SHs were degraded initially and the PAHs and polar components accumulated in soils adjacent to abandoned wells. This study is aligned with Wang's research that saturated hydrocarbon fraction gradually decreased, and aromatic fractions as well as polar components significantly accumulated over the duration of oil well use from the year of 1987-2014 21 . Devi et al. also confirmed that the higher concentration aromatic hydrocarbons frequently presented in the soils near the abandoned oil well than other five sampling sites surrounded by the group gathering stations in Sivasagar district of Assam, India 28 . There are more than 20 million of abandoned oil wells all over the world, so the investigation of contamination level of aromatic hydrocarbons and other toxic pollutants are benefit to abandoned oil wells for remediation and reutilization such as geothermal power generation 29,30 . Fingerprint characteristics of the geochemical indices of n-alkanes. The concentration of n-alkanes in three different geographic oilfield soils were showed in Fig. 4 and Supporting Information, Table S4. The n-alkanes were divided into three groups according to the length of carbon chain, i.e. short chain alkanes www.nature.com/scientificreports/ (C8-C19), medium chain alkanes (C20-C30), and long chain alkanes (C31-C40). The concentration of total n-alkanes showed a great difference in three oilfield soils (p < 0.05). Generally, the concentration of n-alkanes showed a similar law with the residual of TPHs. S8 and S4, Y1 and Y23, N8, N9, and N10 collected from oilfield exploitation areas had the higher concentration of n-alkanes than other soils in different Oilfields. It was noted that the n-alkanes content in grassland soils of Y10, Y20, and Y24 were higher than other grassland and farmland www.nature.com/scientificreports/ soils in YC Oilfield. Previous studies identified that partial grassland and farmland soils were also polluted by petroleum hydrocarbons, such like Sfax and Usinsk Oilfields 31,32 .
The composition of n-alkanes with low molecular weight (LMW) and high molecular weight (HMW) was the important indicators for petroleum migration and weathering process, and high ratio of LMW/ HMW illustrated a new petroleum contaminant input 33 . In this study, the proportions of the LMW n-alkanes (C8-C19) were more than 45% in all the soils adjacent to the working oil wells in contrast to soils adjacent to the abandoned wells of less than 20%. The HMW n-alkanes containing medium chain alkanes (C20-C30) and long chain alkanes (C31-C40) accumulated in the abandoned wells, and the short chain alkanes (C8-C19) has been biodegraded or vapored to some extent. This result was aligned with Wang's study who identified the lower ratio of LMW/ HMW (less than 1) in abandoned well site than working well in Yellow River Delta 34 . This conclusion was also further proved by the geochemical indices of n-alkanes in this study.
The geochemical indices of n-alkanes were shown in Table 2. The geochemical indices of n-alkanes are used to fingerprint analysis of the oil contamination level, biogenic or petrogenic sources of petroleum, and the biodegradation capability 35 . The value of ∑n-alk/n-C16 of all the soils collected near to the abandoned and working oil wells were less than 30, indicating the oil exploitation was the dominant pollutant source in these sites. The value of ∑n-alk/n-C16 in petroleum processing plant (S2), gas station (S5, Y4), and oil transport area (N6) ranged from 30 to 50, demonstrating a contamination potential in the petro-related industries. As such, the petroleum contamination in the soils (except Y10, Y20, Y24) far away from oil wells was optimistic, where the value of ∑n-alk/n-C16 were more than 50. The value of ∑n-alk/n-C16 in Y10, Y20, Y24 were less than 30, revealing these sites were contaminated by petroleum hydrocarbons. These sites were located in the drainage basin of Yanhe River, this might be the result of petroleum migration caused by the rainfall runoff in the partial area 36 . Carbon preference index (CPI) is calculated by the ratio of odd to even carbon-numbered n-alkanes. The CPI analysis make clear the absence/presence of an anthropogenic inputs of petroleum hydrocarbons, of which values close to 1 indicate petroleum inputs, and higher CPI values (mostly between 3 and 6) indicate that the hydrocarbon components originated from vascular plant epicuticular wax 37 . In this study, CPI values in the contaminated soils ranged in 1-3, this confirmed that dominance petrogenic hydrocarbons and minor biogenic hydrocarbons were the oil pollutant source. CPI values of soils adjacent to working oil wells were close to 1, illustrating that the new petroleum input occurred. Similar conclusion was drew in previous study that the CPI values of surface soils near the working oil well were close to 1, whereas the control uncontaminated soils had the higher CPI values 35 . The dominant hydrocarbons derived from vascular plant epicuticular wax in most farmland and grassland soils, as CPI values ranged in 3-6.
Pristane (Pr) and phytane (Ph) are the representative acyclic isoprenoid alkanes, and the ratios of n-C17/Pr, n-C18/Ph, Pr/Ph are usually used as indicator to assess the weathering and biodegradation extent of petroleum 35 . The acyclic isoprenoid alkanes were not detected in the most farmland and grassland soils without contamination by petroleum. The lower ratios of n-C17/Pr (0.76 to 0.93), n-C18/Ph (0.53 to 0.95), Pr/Ph (0.83 to 0.99) were in the soils adjacent to abandoned oil wells than adjacent to working oil wells, with ratios of n-C17/Pr (from 2.33 to 2.88), n-C18/Ph (from 1.93 to 2.96), Pr/Ph (from 1.03 to 1.27). It indicated that more hydrocarbons biodegradation occurred in the long-term oil contaminated soil than in soils that were freshly contaminated 38 . Concentration and source characters of PAHs. The total concentration of the 16 ∑PAHs in three oilfield soils were shown in Fig. 5, and concentration of PAHs in YC Oilfield soils by grid sampling method were   Table 2. The geochemical indices of n-alkanes in different oilfield soils. ∑n-alk/n-C16: Sum of n-alkanes (C8-C40) over to n-alkane (C16); CPI: odd to even carbon preference index from n-C8 to n-C40; C17/Pr: ratio of n-alkane (C17) to pristane; C18/Ph: ratio of n-alkane (C18) to phytane; Pr/Ph: ratio of pristane to phytane; nd: not detected.  39 . In this research, a heavily contaminated tendency was found in three oilfield soils, except the farmland and grassland with concentration of 16 ∑PAHs less than 200 μg/kg, where were less affected by the oil-related activities. PAHs in the environment mainly derive from energy utilization and human activities, and the composition characteristics of PAHs can illustrate their pyrolytic and petrogenic source. Studies have shown that low-ring PAHs (two to three benzene rings) mainly derived from petroleum input, such as oil leakage and ground crude oil. High-ring PAHs (four to six benzene rings) mainly derived from combustion sources, including incomplete combustion of biomass like grass and straw, and fossil fuels (petroleum and coal) 40 . In this study, the proportions of the low-ring PAHs in three oilfields accounted for 29.69-67.14% (SL), 32.89-38.93% (YC), and 15.07-59.4% (NY), respectively. Especially, the proportions of low-ring PAHs were the highest in S4, S7-S12, N8, N9, Y1, Y5, which were collected near the working and abandoned oil wells. This indicated that the petrogenic PAHs were the dominant source in these areas. Soils collected from petrochemical-related sites, such as petroleum processing plant (S2), gas station (S5, Y4), transport area (N6), and farmland (S1) as well as grassland soils (Y1-Y14, Y22, and Y25) had higher proportion of high-ring PAHs component, with more than 53%, indicating the pyrolytic PAHs were the dominant source of aromatic components in these sampling sites.
The ratio of indeno [1,2,3-cd] Pyr)], anthracene to anthracene plus phenanthrene [Ant/(Ant + Phe)] were used to clarify the possible sources of PAHs. Refer to previous research, a value ratio of Flu/(Flu + Pyr) ratio < 0.4 indicates that PAHs are derived from petroleum source, ratio between 0.4 and 0.5 indicates that from petroleum combustion and ratio > 0.5 indicates that from biomass and coal combustion. The Ant / (Ant + Phe) ratio less than 0.1 means that PAHs are derived from petroleum, otherwise they are derived from combustion source. A value ratio of IcdP/(IcdP + BghiP) ratio < 0.2 indicates that PAHs are mainly derived from petroleum source, ratio between 0.2 and 0.5 indicates that from petroleum combustion, whereas ratio > 0.5 indicates that from biomass and coal combustion. A value ratio of BaA/(BaA + Chr) < 0.2 indicates that PAHs are mainly derived from petroleum source, ratio between 0.2 and 3.5 indicates from a mixed source including petroleum and combustion, whereas ratio > 0.35 indicates from combustion source 41 . A ratios plot of Flu/(Flu + Pyr) and Ant/ (Ant + Phe) was shown in Fig. 6a, ratios of IcdP/(IcdP + BghiP) and BaA/(BaA + Chr) was shown in Fig. 6b. The source of PAHs in SL Oilfield mainly originates from the petroleum and petroleum combustion. This can be reasonable explained by that the most SL Oilfield soils were collected near the oil exploitation area and petroleum related industry, such as petroleum refining, oil transportation, and gas station. Therefore, upgrading oil exploitation equipment and improving transport efficiency are the critical control pathway for PAHs pollution caused by oil spilling and landing in SL Oilfield. The source of PAHs in NY and YC Oilfield were complex, due to varieties of sampling environment. The soils sampled nearby abandoned and working oil wells were the dominant petroleum source, whereas the mixed source (petroleum and petroleum combustion) made the greatest devotion to PAHs origination except in farmland and grassland soils far away the petro-related activities. Avoiding oil spills accident and limiting tailpipe emissions from transport vehicles are the effective PAHs source control method in petro-related areas of NY and YC Oilfield. Meanwhile, oil-blocking isolation zones should be set up to prevent PAHs pervasion to the surrounding grasslands and farmlands, especially in the lower reaches of the Yanhe www.nature.com/scientificreports/ River. The PAHs source of farmland and grassland (S1, N1, N7, Y2, Y11, Y12, Y14, Y22, Y25) mainly derived from biomass and coal combustion, illustrating that the heating in winter and biomass (wood, grass, and straw) combustion made a great contribution to the PAHs accumulation. Therefore, the most important management policies are to prohibit burning of wasteland and raise the awareness of environment-friendly heating among local people, as well as explore the way of biomass efficient utilization, such as turning straw into fertilizer and feed 42 , or producing bio-energy 43 . The regional industry has a great influence on the derivation of PAHs in this study, and the distribution of PAHs varies in different regions. Previous studies have descripted that dominant PAHs derived from anthropogenic sources, such as oil extraction, petroleum processing and refining, municipal waste incineration, automobile exhaust emissions, coal and biomass (wood, grass, and straw) burning 44 . Overall, the source characteristics of PAHs based on the isomer ratio and PAHs rings can prove a critical strategy on the source control of the PAHs contamination.
Ecological risk assessment of PAHs. The occurrence characters of PAHs, especially derived from anthropogenic activities has raised great concerns over the ecological and health risk, due to their carcinogenic, teratogenic, and mutagenic properties 44 . The descriptive statistic TEQBap of PAHs in different sampling sites was shown in Table 3.  46 . Therefore, contamination control should priority focus on the individual PAHs of BaP, DBA, BbF in these areas. In addition, the ecological risk with abandoned time ranging 0-15 years has been assessed, and the descriptive statistic TEQBap of PAHs was shown in Supporting Information, Table S6. The highest TEQs of ∑PAH16 and ∑PAH7 with mean of 1422.27 μg/ kg and 1400.48 μg/kg, respectively, were present in soils adjacent to abandoned oil well with abandoned time of 0-5 years. And the TEQs of ∑PAH16 and ∑PAH7 decreased with the abandoned time though the percentage proportion of PAHs increased. The TEQs of ∑PAH16 and ∑PAH7 were close between abandoned time of 5-10 years and 10-15 years while both had high content. It demonstrated that high ecological risk was persistent in abandoned oil well areas over abandoned time of 15 years, and basically stable after 5 years. Therefore, abandoned oil well areas need to be blocked to prevent PAHs entering the external environment, and combine physical-chemical technology for petroleum remediation instead of simple weathering biological processes. As referred the PAHs standard of Dutch soil, TEQs of ∑PAH7 was 32.02 μg/kg, calculated by ten individual PAHs times TEFs. In this study, the mean TEQs of ∑PAH7 were about 35-and 10-folds of Dutch soil in petrorelated area soils and grassland soils, indicating a high and medium ecological risk in these soils respectively. However, the mean TEQs of ∑PAH7 in farmland soils (18.80 μg/kg) was below Dutch soil, presenting a low potential ecological risk. It should be noted that the minimum of TEQs of ∑PAH7 in grassland soil was 26.24 μg/ kg less than TEQs of ∑PAH7 in Dutch soil, but it was vulnerable affected by the surrounding soils with high TEQs of ∑PAH7. In this study, except the farmland soils, TEQs of ∑PAH7 exhibited higher TEQ values than those reported soils in Santiago, Chile 47 and Nepal 24 , and road dust in Tianjin, China 48 . Overall, the most threat of ecological risk in petro-related soils caused by the anthropogenic PAHs input, such like oil leakage, oil refining, and fossil energy combustion. Preventing oil spills accident and developing the remediation methods are www.nature.com/scientificreports/ the main significant ways to reduce the ecological risks in these areas. The medium ecological risk in grassland might result from the migration of PAHs via rainfall pathway. Therefore, establishment the oil-blocking isolation zones is the critical way for medium ecological risk areas to control petroleum inflow. Even though the low ecological risk was identified in farmland soils, PAHs source analysis indicated that the biomass combustion should be controlled in these areas.

Conclusion
The results of this study revealed the occurrence characteristics and source of petroleum hydrocarbons in three different geographical oilfields soils of China by fingerprint analysis. Moreover, ecological potential risk of PAHs was assessed according to the toxic equivalent quantity of carcinogenic PAHs. The main conclusions are as follows: (1) The general oil contamination level was higher in SL Oilfield soils than other NY and YC Oilfield, and the higher concentration of TPHs were found in soils adjacent to the working and abandoned oil wells as well as partial grassland collected from YC Oilfield; (2) The higher proportion of LMW n-alkanes over total n-alkanes in soils adjacent to the working oil wells indicated new petroleum contaminant input occurred, whereas the saturated hydrocarbons were largely degraded in soils adjacent to the abandoned wells, confirmed by lower proportion of LMW n-alkanes and lower ratio of n-alkanes to acyclic isoprenoid alkanes; (3) The PAHs of oil exploitation areas mainly derived from petroleum source in three oilfield soils, however the PAHs of soils collected from petrochemical-related sites, such as petroleum processing plant, gas station, and transport areas mainly derived from the petroleum combustion. The biomass combustion was the dominant PAHs source in farmland and grassland soil far away from the oil exploitation activities. The suggestive petroleum control strategies were proposed, such as upgrading oil exploitation equipment in oil exploitation, setting up oil-blocking isolation zones in abandoned oil well areas and drainage basin of Yanhe River, and prohibiting burning of wasteland in farmland and grassland; (4) High, medium, and low ecological risk from carcinogenic PAHs presented in petro-related area soils, grassland soils, and farmland soils, respectively. High ecological risk was persistent in abandoned oil well areas over abandoned time of 15 years, and basically stable after 5 years.

Data availability
The authors declare that all data supporting the findings of this study are available within the article and its supplementary information files.  www.nature.com/scientificreports/